Skip to content

[issue-6019] [P SDK] Fix cache tokens not tracked in Bedrock Claude streaming#6038

Closed
ollieagent[bot] wants to merge 5181 commits into
mainfrom
ollie/issue-6019-add-cache-tokens-to-usage-057507
Closed

[issue-6019] [P SDK] Fix cache tokens not tracked in Bedrock Claude streaming#6038
ollieagent[bot] wants to merge 5181 commits into
mainfrom
ollie/issue-6019-add-cache-tokens-to-usage-057507

Conversation

@ollieagent

@ollieagent ollieagent Bot commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Details

In ClaudeAggregator.aggregate(), the message_start chunk handler only extracted input_tokens and output_tokens from the usage dict. The cache token fields (cache_creation_input_tokens, cache_read_input_tokens) were present in the Anthropic streaming response but were never read, so anthropic_to_bedrock_usage() always received zeros for them — causing cacheWriteInputTokens and cacheReadInputTokens to be permanently 0 in the logged span usage.

The fix initializes both cache token variables before the chunk loop, extracts them from the message_start chunk's message.usage dict, and includes them in the call to anthropic_to_bedrock_usage(). The non-streaming path was already correct (it passes the full usage dict from the response body).

Change checklist

  • User facing
  • Documentation update

Issues

Testing

New unit tests in tests/unit/integrations/bedrock/test_claude_aggregator.py:

  • test_claude_aggregator__cache_tokens_in_message_start__included_in_usage — verifies that cacheWriteInputTokens and cacheReadInputTokens are populated from the message_start chunk when cache tokens are present.
  • test_claude_aggregator__no_cache_tokens__defaults_to_zero — verifies the zero-default behavior is preserved when cache tokens are absent.

To run:

cd sdks/python
python -m pytest tests/unit/integrations/bedrock/test_claude_aggregator.py -v

Documentation

No documentation updates needed.

CometActions and others added 30 commits March 18, 2026 14:14
Co-authored-by: Andres Cruz <andresc@comet.com>
…t-scoped operations (#5694)

* [OPIK-4932] [OPIK-4937] [BE] Add project_id to experiments table and support project scoping

Add project_id column with minmax index to ClickHouse experiments table.
Update ExperimentService, ExperimentDAO, and ExperimentsResource to support
project_id on create (via projectId or projectName) and filtering on list.

* Revision 2: Make resolveProjectId reactive in experiment creation chain

Chain resolveProjectId as a flatMap in the reactive pipeline instead of
calling it synchronously, avoiding blocking the reactive chain.

* Revision 3: Move project_id filter into UNION branches and fix empty-string bug

- Move project_id predicate from outer WHERE into each UNION branch for
  FIND and FIND_GROUPS queries, so aggregations are filtered early
- Fix arrayConcat to not append '' when experiment project_id is null,
  which was incorrectly triggering the project_deleted predicate
- Use agg.project_ids directly in aggregated branches since
  experiments_from_aggregates_final already has the project_id

* [OPIK-4932] [BE] fix: use createPartialExperiment in DatasetsResourceTest

Use experimentResourceClient.createPartialExperiment() instead of
factory.manufacturePojo(Experiment.class) to avoid PODAM generating
random projectId values that fail validation after project_id support
was added to experiments.

* [OPIK-4932] [BE] fix: capitalize SQL FROM keyword in ExperimentDAO

* [OPIK-4932] [BE] fix: address PR comments — non-nullable project_id, alias refs, materialize index

- Change project_id from Nullable(FixedString(36)) to FixedString(36) DEFAULT ''
  to avoid Nullable performance overhead in ClickHouse
- Replace coalesce/isNull with if(notEmpty(...))/empty() for non-nullable column
- Reference pre-computed combined_project_ids/project_ids aliases in WHERE clauses
  instead of repeating arrayConcat expressions
- Add MATERIALIZE INDEX to populate idx_project_id on existing data
- Remove unnecessary toString() calls since project_id is already a string

* [OPIK-4932] [BE] fix: use String instead of FixedString(36) for project_id column

* [OPIK-4932] [BE] fix: align migration with liquibase conventions

- Group rollbacks at end of changeset
- Remove inline comments between statements
- Fix --rollback empty format (no semicolon)

* [OPIK-4932] [BE] fix: rename index to idx_experiments_project_id and use GRANULARITY 1

* [NA] [BE] fix: rename migration 000068 to 000070 to avoid conflict with main

* [OPIK-4932] [BE] test: fix createExperimentWithProjectName and createExperimentWithNewProjectName to include projectId in expected

---------
…d stream endpoints (#5713)

* [OPIK-4934] [BE] feat: add project_name filter to dataset retrieve and stream endpoints

* [OPIK-4934] [BE] test: fix compilation and add project_name filter tests

- Fix positional DatasetIdentifier/DatasetItemStreamRequest constructors
  to use builder pattern after new fields were added
- Add ProjectService mock to DatasetsResourceIntegrationTest constructor
- Add 4 new integration tests covering project_name filter behavior:
  - getDatasetByIdentifier with valid project_name returns scoped dataset
  - getDatasetByIdentifier with non-existing project_name falls back gracefully
  - streamDatasetItems with valid project_name returns scoped items
  - streamDatasetItems with non-existing project_name falls back gracefully

* [OPIK-4934] [BE] fix: address PR review comments

- Rename resolveProjectName to resolveProjectIdByName for clarity
- Only set resolved projectId when non-null to avoid clobbering an
  existing projectId on the request
- Log only datasetName and projectId instead of full request object
  to avoid exposing user-supplied filter strings in logs

* [OPIK-4934] [BE] refactor: add findProjectIdByName helper to ProjectService

Extract the repeated projectService.findByNames(...).stream().findFirst().map(Project::id)
pattern into a shared Optional<UUID> findProjectIdByName(workspaceId, projectName) method
on ProjectService. DatasetsResource.resolveProjectIdByName() now delegates to it.

* [OPIK-4934] [BE] fix: address PR review comments

- Add @JsonIgnore to projectId in DatasetItemStreamRequest to prevent
  client deserialization of server-internal field
- Guard resolveProjectIdByName in streamDatasetItems so client-supplied
  projectId is never clobbered by name resolution
- Introduce DatasetCriteria overload on DatasetService.findByName for
  consistency with the find() API
- Use DatasetCriteria in DatasetsResource.getDatasetByIdentifier
- Add getDatasetByIdentifier/callGetDatasetByIdentifier to
  DatasetResourceClient test helper; refactor inline REST calls to use it
- Use builder pattern for DatasetIdentifier in tests
- Add streaming test covering project_id filter

* fix(stream): use client projectId directly when present instead of null

Use request.projectId() as-is when the client supplies it, falling back
to resolveProjectIdByName only when projectId is absent. The previous
form set resolvedProjectId to null on the else-branch, which happened to
work because resolvedRequest fell back to the original request (which
already carried projectId), but was misleading and fragile.
* [OPIK-4713] add new permission

* [OPIK-4713] add permission checks
* [OPIK-5044] [BE] feat: add P2 workspace permission annotations

Add P2 scope permissions from the workspace permissions spec:
- WORKSPACE_SETTINGS_CONFIGURE for workspace config upsert/delete
- USER_ROLE_UPDATE (enum only, no endpoint yet)
- AI_PROVIDER_UPDATE for LLM provider key create/update/delete
- ANNOTATION_QUEUE_ANNOTATE for adding items to annotation queues

Includes RequiredPermissionsTest for all annotated endpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(permissions): add permission denial tests for P2 endpoints

Add tests verifying endpoints return 403 when the auth service denies
the required permission. Covers all P2-annotated endpoints plus existing
ANNOTATION_QUEUE_DELETE endpoints.

Adds AuthTestUtils.mockTargetWorkspaceDenyPermission() helper and
call* client methods for AnnotationQueuesResourceClient.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Andres Cruz <andresc@comet.com>
)

Co-authored-by: Andres Cruz <andresc@comet.com>
…CompareRedirect (#5733)

* e2e fix

* fix(tests): apply prettier formatting to OptimizationCompareRedirect.test.tsx
…lete system overview (#5735)

* [OPIK-5096] [DOCS] docs: update self-host architecture page with complete system overview

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(architecture): add updated draw.io architecture diagram

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Updated image

* docs(architecture): add ClickHouse and MySQL schema draw.io diagrams

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Updated images

* Optimised images with calibre/image-actions

---------
…ments (#5693)

* [OPIK-4615] [DOCS] docs: update dashboard documentation for GA refinements

Rewrite dashboards.mdx to reflect the OPIK-4615 dashboard overhaul:
- Remove beta notice, templates section, and "Working with templates"
- Add dashboard types (Multi-project/Experiments) and scope (Workspace/Insights)
- Add Insights tab section (built-in Project Overview, custom views, views selector, auto-save)
- Add Leaderboard widget documentation
- Update widget docs (per-widget project selector, unified modal, filter/group)
- Update saving behavior (workspace Save/Discard, Insights auto-save, built-in read-only)
- Update date range filtering (decoupled from save state)
- Replace old screenshots with new ones, remove unused images
- Update production_monitoring.mdx to reference Insights tab

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Optimised images with calibre/image-actions

* docs(dashboards): add experiment pages to Insights description

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(dashboards): document breakdown fields per data source and aggregation toggle

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(dashboards): rename project overview image to dashboard_example.png

Fixes missing image reference in changelog.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…nt (#5724)

* delete env endpoint

* address comments

* remove unneeded validation
* OPIK-4449 initial

* OPIK-4449 experiment unnified, removed FF

* OPIK-4449 execution policy + evaluators

* OPIK-4449 fix linter

* OPIK-4449 fix deps circular issue

* OPIK-4449 fix description column accessor and polish eval suite UI

Fix table column reading description from row.data.description instead of
row.description. Also remove card shadows, reduce textarea min rows, and
clean up editItem merge logic in draft store.

* OPIK-4449 fix toggling off item-level execution policy override

Stop stripping undefined values in the draft store so that
execution_policy: undefined is preserved as an explicit "clear override"
signal. Use the `in` operator in UI components to distinguish absent keys
from explicit undefined, and emit clear_execution_policy flag in the
save payload for the backend.

* OPIK-4449 rename remaining "dataset" labels to "evaluation suite" in UI

* OPIK-4449 generate descriptions and assertions when expanding eval suite items with AI

Enhance the "Expand with AI" feature to automatically generate item-level
descriptions and LLM judge assertions for evaluation suite items. The prompt
now requests _opik_description and _opik_evaluator_assertions magic keys in
the generated JSON, which are extracted and converted to proper item fields
before adding to the draft store. Suite-level evaluator assertions are
included in the prompt context to avoid duplication. The generated samples
preview dialog now shows descriptions and assertions inline.

* OPIK-4449 fix prettier lint in ExecutionPolicyCell

* OPIK-4449 experiment updates

* OPIK-4449 merge fixes

* refactor

* OPIK-4449 open full panel with all fields when creating eval suite item

Wire "Create evaluation suite item" button to create a draft item in
the store and open EvaluationSuiteItemPanel instead of the legacy
AddDatasetItemSidebar/AddEditDatasetItemDialog. The panel detects new
items via draftStatus and adjusts UI accordingly (title, dropdown,
navigation). Regular datasets keep the existing flow unchanged.

* [OPIK-4449] Rename internal evaluator references to assertion

Rename evaluator-converters.ts to assertion-converters.ts and update
all internal function names and imports to use assertion-centric naming.
Rename OPIK_EVALUATOR_ASSERTIONS_FIELD constant to OPIK_ASSERTIONS_FIELD.
API-facing field names (evaluators on DatasetItem/DatasetVersion) are
preserved unchanged.

* OPIK-4449 new Figma updates

* Fix lint issues: remove unused useMemo import and fix prettier formatting

* Fix prettier formatting in remaining eval suite files

* OPIK-4449 refactoring

* OPIK-4449 fix issues

* OPIK-4449 experiment UI updates

* OPIK-4449 pass rate

* OPIK-4449 merge main

* [OPIK-5030] [FE] Fix UI crash on unfinished eval suite experiments

Guard pass_rate with isNumber() before assigning to scores object,
preventing null/undefined from propagating into chart tick calculator.
Harden useChartTickDefaultConfig to filter non-finite values and
bail early in generateNiceTicks, avoiding infinite loop and RangeError.

* OPIK-4449 fix issues

* [OPIK-5039] [FE] Fix view evaluation item button and lint

* OPIK-4449 add type for create modal

* OPIK-4449 remove internal plan/design docs from branch

* OPIK-4449 fix prettier formatting issues
…ling (#5720)

* [NA] [SDK] feat: instrument Anthropic beta API and fix compaction billing

- Patch `client.beta.messages.create` and `client.beta.messages.stream`
  in `track_anthropic` so beta API calls are traced like the standard API
- Add `patch_sync/async_beta_message_stream_manager` in stream_patchers.py
  to handle `BetaMessageStreamManager`/`BetaAsyncMessageStreamManager`
- Add `AnthropicUsage.get_billable_tokens()` that sums all `iterations`
  when compaction fires (top-level tokens exclude the compaction iteration)
  https://platform.claude.com/docs/en/build-with-claude/compaction#understanding-usage
- Store raw `usage.iterations` in span metadata for visibility
- Add integration tests for beta create/stream (sync and async)
- Add unit tests for compaction iteration billing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(anthropic): include cache tokens per iteration when compaction+caching combined

SDK types (BetaMessageIterationUsage, BetaCompactionIterationUsage) confirm that
cache_creation_input_tokens and cache_read_input_tokens are always present on each
iteration. Mirror the non-compaction path: add cache_read to prompt and
cache_creation to completion for every iteration.

Also fix test data to be consistent with the doc: top-level input/output_tokens
reflect only the non-compaction iterations, not 45000/1234 which was inconsistent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(anthropic): cache_creation_input_tokens is input cost not output

total_input = input_tokens + cache_read_input_tokens + cache_creation_input_tokens
https://platform.claude.com/docs/en/build-with-claude/prompt-caching#tracking-cache-performance

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(anthropic): address Baz review comments

- Guard beta stream imports with try/except ImportError; set module-level
  sentinels to None so the rest of the module loads on older anthropic SDK versions
- Use inline try/except ImportError for beta isinstance checks in
  _streams_handler, following the existing SDK pattern (see crewai patcher)
- Fix logger calls: remove stray str(exception) arg with no %s placeholder
- Narrow except Exception -> except AttributeError when patching beta methods

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
… and schema flattening (#5707)

* [NA] [SDK] fix: flatten LLMJudge JSON schema to avoid $ref indirection

Inline all $defs/$ref references in the ResponseSchema JSON schema sent
to providers. This reduces grammar-constrained decoding complexity for
Anthropic models (especially Haiku) which frequently return incomplete
JSON when schemas use $ref indirection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm-judge): add tenacity retry and parse validation

Parse now validates result count and scoring_failed status, raising
LLMJudgeParseError with partial results attached. The retry decorator
on _generate_and_parse retries up to 3 times on parse failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(litellm): extract structured output from tool_calls when content is null

litellm implements response_format via tool_use for Anthropic models.
Under concurrent load ~5% of responses arrive with content=null and the
JSON in tool_calls[0].function.arguments instead. Only extracts from
tool calls named "json_tool_call" (litellm's structured output marker)
to avoid interfering with real tool use responses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(anthropic): add native AnthropicChatModel for evaluation

Cherry-picked from alexkuzmik/native-anthropic-model. Adds a native
Anthropic client that bypasses litellm for anthropic/ prefixed models,
with proper param filtering and structured output via tool_use.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(anthropic): use messages.parse for native structured output

Switch from tool_use workaround to Anthropic's native structured output
API (messages.parse with output_format). The SDK handles schema
transformation internally and returns JSON as text content, eliminating
the tool_use indirection entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(anthropic): add 60s timeout to Anthropic client connections

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix lint errors

* feat(anthropic): add tracking for messages.parse structured output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(llm-judge): address PR review feedback

- Handle None input in ResponseSchema.parse to avoid TypeError crash
- Remove `stream` from _SUPPORTED_PARAMS to prevent streaming responses in generate_string
- Log LLMJudgeParseError with stack trace before returning partial results

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address petrotiurin review feedback

- Revert unintended package-lock.json change
- Extract DEFAULT_MAX_TOKENS constant for easier discovery
- Add depth limit (50) to _resolve_refs recursion to prevent loops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(anthropic): add tracking for beta.messages.parse

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unintended frontend and optimizer files from branch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove optimizer reflection logs accidentally added in merge

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* [OPIK-5097] [FE] chore: add v1/v2 scaffolding, docs, and agent rules

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-5097] [FE] refactor: move components/ into ui/, shared/, v1/ structure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…layer (#5727)

* [OPIK-4934] [BE] refactor: move project name resolution into service layer

Address andrescrz review feedback: resolveProjectIdByName logic belongs
in the service, not the resource.

- Add DatasetService.findByName(workspaceId, DatasetIdentifier, visibility)
  overload that resolves project name internally via ProjectService
- Inject ProjectService into DatasetItemServiceImpl; move project
  name->id resolution inside getItems so the resource no longer needs it
- Remove resolveProjectIdByName helper, ProjectService field and import
  from DatasetsResource
- Fix DatasetsResourceIntegrationTest to match updated constructor

* [OPIK-4934] [BE] refactor: address baz review — strict project validation and reactive resolution

- DatasetService.findByName(identifier): throw NotFoundException when projectName is
  provided but no matching project exists, instead of silently falling back to a
  workspace-wide search
- DatasetItemService.getItems: remove pre-reactive ProjectService call; delegate
  project resolution to DatasetService.findByName(DatasetIdentifier) inside
  Mono.fromCallable so blocking lookup runs on boundedElastic scheduler
- Remove ProjectService injection from DatasetItemServiceImpl (no longer needed)

* fix(tests): restore fallback and ignore ttft in trace assertions

- DatasetService.findByName: revert orElseThrow to orElse(null) so that
  a non-existent project name falls back to searching without project
  scope, preserving the documented contract tested by
  getDatasetByIdentifier__whenNonExistingProjectName and
  streamDataItems__whenNonExistingProjectName
- TraceAssertions: add ttft to IGNORED_FIELDS_TRACES to prevent flaky
  failures from Double precision loss when ClickHouse stores/returns the
  field (findWithImageTruncation parameterized tests were affected)

* fix(tests): compare ttft with Double tolerance instead of ignoring

Replace the blanket ttft ignore with a proper Double comparator.
findWithImageTruncation now builds a RecursiveComparisonConfiguration
with StatsUtils::compareDoubles (abs tolerance 1e-6) so the field
value is still verified but floating-point ULP differences from
ClickHouse storage are tolerated. Consistent with assertTraces()
which already used this comparator for all Double fields.

* Update GetTracesByProjectResourceTest.java
* [OPIK-4712] Add new permission

* [OPIK-4712] Hide create/clone dashboard CTAs

* [OPIK-4712] disable template editing

* [OPIK-4712] do not disable view switching

* [OPIK-4712] remove redundancies

* [OPIK-4712] remove redundancies

* [OPIK-4712] remove view permission check from insights

* [OPIK-4712] remove view permission check from insights

* [OPIK-4712] remove redundancies
* [OPIK-4449] [FE] revert UI labels from "Evaluation suites" back to "Datasets"

Revert user-facing text to show "Datasets" until evaluation suites feature is released.
Hide the type selector in the create modal so it always creates a dataset.

* [OPIK-4449] [FE] fix: prettier formatting
…o prompt version endpoints (#5736)

* [OPIK-4935] [BE] feat(api): add project_name and project_id scoping to prompt version endpoints

Add project_name and project_id fields to POST /v1/private/prompts/versions/retrieve
and POST /v1/private/prompts/versions, allowing callers to scope prompt lookups
and creation to a specific project.

- PromptVersionRetrieve: add optional project_name (filters lookup to given project)
- CreatePromptVersion: add optional project_id (takes precedence) and project_name
- PromptService: extract resolveProjectId() helper; use strict DAO lookup in
  retrievePromptVersion to prevent fallback to workspace-level when project_name
  is explicitly provided; validate project_id existence when supplied
- PromptResourceTest: update all PromptVersionRetrieve call sites to use builder,
  add project_name scoping and validation tests

Implements OPIK-4935

* fix(prompts): throw NotFoundException when project_name is provided but not found in retrieve

* fix(prompt): restore workspace-level fallback in retrievePromptVersion

Use the private findByName helper (which already handles the
project → workspace fallback) instead of calling promptDAO.findByName
directly, so retrieval behaves consistently with the creation path.

* test(prompt): update retrieve fallback test to match workspace-wide fallback design

When the project-level lookup misses, retrievePromptVersion falls back
to a workspace-wide search. Updated the test to assert 200 (found via
fallback) instead of 404.

* test(prompt): add missing retrieve scenarios for project-scope fallback

- workspace-level prompt found when projectName is specified (fallback)
- 404 when projectName does not exist in the workspace

* [OPIK-4981] [BE] Fall back to workspace-wide in retrievePromptVersion when project not found

When projectName is provided but does not resolve to an existing project,
pass null as projectId so the lookup falls through to workspace-wide search
instead of throwing 404.
* initial

* pr comments

* pr comments
…riment and optimization queries (#5745)

* [OPIK-5150] [BE] perf: replace FINAL with ORDER BY/LIMIT 1 BY in experiment and optimization queries

Replace ClickHouse FINAL modifier with explicit ORDER BY DESC + LIMIT 1 BY
pattern across ExperimentDAO and OptimizationDAO queries for improved query
performance in optimization studio.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review - two-layer dedup pattern, remove redundant LIMIT 1 BY, narrow SELECTs

- Use dedup-then-filter subquery pattern: inner query deduplicates with
  immutable sort-key filters only, outer query applies mutable column
  filters (name, type, optimization_id, tags, etc.) to prevent phantom rows
- Remove LIMIT 1 BY from feedback_scores/authored_feedback_scores CTEs
  (redundant with downstream ROW_NUMBER, and missing author key would
  drop authored scores)
- Narrow SELECT * to specific columns in spans and experiment_aggregates
  helper CTEs where only a few columns flow downstream

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ce fallback (#5744)

* [OPIK-4962] Add deprecation header handling to HTTPX client

* Implemented logic to log deprecation warnings for responses containing the X-Opik-Deprecation header, ensuring warnings appear only once per path during the session.
* Added related unit tests to validate behavior for scenarios with and without deprecation headers.
* Refactored HTTPX client to include warning tracking and integrate logging functionality.

* [OPIK-4962] Improve HTTPX client deprecation warning handling and fix tests

- Ensure deprecation warnings are logged once per method and path combination.
- Fix unit tests to validate warnings with updated message formatting.
- Add client closing to improve testing cleanup.
* [OPIK-4942] [BE] POC: Separate assertion_results table (Option E)

Demonstrates the architecture for storing assertion results in a dedicated
ClickHouse table instead of piggybacking on feedback_scores with
category_name='suite_assertion'.

Changes:
- New assertion_results ClickHouse table (migration 000070)
- AssertionResultDAO for writing assertion data to the new table
- FeedbackScoreDAO splits writes: assertions -> assertion_results, regular -> feedback_scores
- ExperimentItemDAO STREAM query adds assertion_results_per_trace CTE
- ExperimentItemMapper passes assertions_array to enrichWithAssertions
- AssertionResultMapper reads from dedicated column instead of partitioning feedback scores

Not included in this POC (would be needed for production):
- DatasetItemDAO/DatasetItemVersionDAO assertion CTE changes
- ExperimentAggregatesDAO pass rate aggregation from new table
- REST endpoint exclude_category_names cleanup
- Data migration for existing installations
- SDK changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Fix test compilation: inline suite_assertion constant

The SUITE_ASSERTION_CATEGORY constant was removed from AssertionResultMapper
in the Option E refactor, but the test still referenced it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Add assertion_results_per_trace CTE to compare endpoint

- Add assertion_results_per_trace CTE to DatasetItemVersionDAO (both
  has_aggregated and has_raw branches) — compare endpoint was using
  DatasetItemVersionDAO, not DatasetItemDAO which had the CTE
- Add arp.assertions_array at tuple index 19 in both branches
- Remove group.size() <= 1 guard in AssertionResultMapper.computeRunSummaries()
  so run summaries are emitted when a dataset item has 1 run per experiment
- Add assertion_scores_avg Map column to experiment_aggregates (migration 000071)
- Add AssertionScoreAverage API record

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Revert package-lock.json — unintentional change from lint hook

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Fix test compilation errors

- AssertionResultMapperTest: update enrichWithAssertions calls to use
  (item, jsonString) signature; rewrite tests for assertion_results
  table approach (no longer reads from feedbackScores); update
  computeRunSummaries_singleRun test to reflect removed group.size()<=1 guard
- ExperimentsResourceTest: remove extra null arg from getFeedbackScoreNames
  calls (leftover from older branch version of the method)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Bump migration numbers to 071 and 072

000070 conflicts with 000070_add_project_id_to_experiments.sql from main.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Use JsonUtils import in AssertionResultMapper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Update test: runSummaries emitted for single-run suite experiments

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Remove misleading comment from runSummaries test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Narrow catch to JsonProcessingException in AssertionResultMapper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Add assertionScores to EXPERIMENT_IGNORED_FIELDS in test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Add assertionScores assertions to PassRate tests + fix score routing

- Add .categoryName("suite_assertion") to PassRate.score() helper so
  scores route to assertion_results table (required for pass rate SQL)
- Fix itemThreshold test: set per-item executionPolicy in createDatasetItems
  instead of applyDatasetItemChanges to avoid version-2 row-ID mismatch
- Add assertionScores assertions to 4 tests: thenReturnPassRate (2/3),
  multipleAssertions (scoreName1=1.0, scoreName2=0.5), passThresholdNotMet
  (1/3), and itemThreshold (4/6)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Address PR review comments: AssertionStatus enum, JsonUtils, SQL cleanup

- Use JsonUtils.readValue() instead of getMapper().readValue() (comment #1)
- Replace explicit CAST with tuple() in SQL for type flexibility (comments #2, #3)
- Change passed column from UInt8 to Enum8('passed'=0,'failed'=1) (comment #4)
- Add AssertionStatus enum used end-to-end from DB to API response
- Update all SQL queries using toFloat64(passed) to toFloat64(passed = 'passed')
- Add project_id filter to assertion_results query in DatasetItemVersionDAO (comment #6)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Fix test compilation: use AssertionStatus enum instead of boolean assertions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Add AssertionResultService + fix toJSONString tuple serialization

Move assertion score routing from FeedbackScoreDAO to service layer via
dedicated AssertionResultService. Fix assertion_results query where
toJSONString(tuple(...)) produced arrays instead of objects — use CAST
with named Tuple type so toJSONString emits JSON objects matching
AssertionResultRow record.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Remove suite_assertion exclusion tests from Traces and Projects

suite_assertion scores now go to the separate assertion_results table,
so exclude_category_names filtering is no longer needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4942] [BE] Return boolean passed field in AssertionResult API response

Map AssertionStatus enum to boolean in AssertionResultMapper so
SDK/FE consumers receive passed: true/false instead of passed/failed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Andres Cruz <andresc@comet.com>
…ements (#5715)

* use config to setup LocalRunnerReaperJob

* create string RedissonClient bean

* Use reactive Redis client for nextJob long-poll

* refactor heartbeat

* add reaperMaxRunnersPerCycle

* address comments

* refactor local runner reaper

* moved polled job to active list in a transactional manner

* address comments

* fix potential race condition

* address comments
thiagohora and others added 18 commits April 1, 2026 11:08
…ually loads (#5954)

* [OPIK-4828] [FE] fix: prevent No Data flash and stale thread data in ThreadDetailsPanel

- Remove keepPreviousData from useThreadById and useTracesList to avoid
  showing the previous thread's data when switching between threads
- Change && to || in renderContent loading check so that a Loader is
  shown whenever either query is pending, not only when both are pending
  simultaneously (which caused No Data to appear if traces resolved first)

* feat(db): add bloom filter skip index on traces.thread_id

OPIK-4828: thread_id had no skip index, causing full partition scans
when filtering traces by thread in the thread view. The bloom filter
at 1% false positive rate allows ClickHouse to skip ~99% of irrelevant
granules for thread_id lookups.

* fix(threads): enable skip index on FINAL scan in findById query

Add SETTINGS use_skip_indexes_if_final = 1 so the bloom filter on
traces.thread_id is applied even when reading with FINAL, allowing
ClickHouse to skip irrelevant granules during thread view loading.

* [OPIK-4828] [BE] perf: remove FINAL from append-only feedback score tables in thread queries

Remove FINAL modifier from feedback_scores and authored_feedback_scores
reads across all thread DAO queries. authored_feedback_scores is
append-only so FINAL is unnecessary overhead; feedback_scores reads are
deduplicated via ROW_NUMBER() window function downstream so FINAL is
also redundant.

Also optimise SELECT_TRACES_THREAD_BY_ID with a two-phase CTE approach:
narrow to matching trace IDs first (with FINAL), then read full rows
without FINAL using LIMIT 1 BY for deduplication. Benchmarked ~5x
faster on production for threads with large trace counts.

* Fix issue

* fix(threads): remove trailing whitespace in ThreadDAO SQL queries

* [OPIK-4828] [BE] Address PR review: move use_skip_indexes_if_final to global config, fix redundant dedup

* [OPIK-4828] [BE] Remove remaining per-query use_skip_indexes_if_final settings

* Fix tests

* Update and rename 000076_add_thread_id_skip_index_to_traces.sql to 000077_add_thread_id_skip_index_to_traces.sql
* init kpi cards;

* finish kpi cards and graph;

* refactor;

* eslint issues;

* baz review comments;

* eslint issues;

* revert endtime;

---------

Co-authored-by: aadereiko <aliaksandr@comet.com>
… in runner (#5952)

* [OPIK-5326] [SDK] feat: cast job input values to declared param types in runner

Add input type casting to both the TypeScript and Python in-process runner
loops so agent functions receive correctly-typed arguments regardless of how
the server serialised the values in job.inputs.

TypeScript: export castInputValue() from InProcessRunnerLoop, apply it in
invokeAgent using each Param's declared type (boolean / number / string).

Python: add cast_input_value() to in_process_loop, apply it in _execute_job
for all keys that match a registered param (bool / int / float / str); keys
such as opik_args that are not in params pass through unchanged.

Both implementations follow the same pattern as typeHelpers.ts: primitives
are cast natively, complex types (dict/list) are JSON-serialised as strings,
and null/None passes through unchanged.

Unit tests added for both SDKs using parametrisation to cover each type
individually and a set of multi-param combination scenarios.

Implements OPIK-5326

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-5326] [SDK] refactor: reuse type_helpers deserialization in runner casting

Address baz-reviewer feedback on PR #5952:

TypeScript (comment #3009010633): castInputValue now delegates to
deserializeValue() from typeHelpers.ts for string→boolean and string→number
conversions. Adds a Number.isNaN guard so non-numeric strings (e.g. "abc")
throw TypeError instead of silently passing NaN to the agent function.

Python (comment #3009010639): extract_params now calls unwrap_optional()
from type_helpers.py before extracting the type name, so Optional[int]
annotations correctly store type="int" instead of the raw
"typing.Optional[int]" string. cast_input_value is rewritten to delegate to
backend_value_to_python_value() from type_helpers.py, unifying the
conversion logic across AgentConfig and the runner.

Python (comment #3009010648): renamed all test functions in
test_cast_input_value.py to follow the repo convention
test_WHAT__CASE_DESCRIPTION__EXPECTED_RESULT. Added tests for Optional[T]
unwrapping in extract_params and for the new backend type name aliases
("boolean", "integer", "string").

Skipping comment #3009010644 (bool "1"→True): strict "true"/"false" only
behaviour is intentional and mirrors the TypeScript SDK — the backend
serialises booleans as true/false, not "1"/"yes".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-5326] [SDK] refactor: move type_helpers to shared path

type_helpers.py / typeHelpers.ts were under agent_config but are now
used by the runner as well. Move them to a neutral location:

  Python:     api_objects/agent_config/type_helpers.py
           →  api_objects/type_helpers.py

  TypeScript: agent-config/typeHelpers.ts
           →  typeHelpers.ts  (opik package root)

Update all import sites in both SDKs: agent_config internals
(config, blueprint, base, AgentConfig, Blueprint, index), the runner
(in_process_loop, registry), the client (Client.ts), and all
corresponding test files.  No logic changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-5303] [SDK] refactor: use module-form import for type_helpers in in_process_loop

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [OPIK-5326] [SDK] refactor: standardise input casting around backend type names

- Remove backend_type param from backend_value_to_python_value (was unused)
- Raise TypeError for truncating int casts ("3.9") and bool→int coercion
- registry.extract_params now emits backend type names (integer/boolean/string)
  so Param.type is consistent with what the server expects
- cast_input_value delegates directly to backend_type_to_python_type; the
  dual Python/backend name lookup (type_name_to_python_type) is removed
- Add _execute_job integration tests covering multi-typed params in both
  Python and TypeScript

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [NA] revert package-lock.json to main

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(runner): update extract_params assertions to backend type names

* fix(runner): treat '1'/'yes' as truthy bool; report cast errors as failed jobs

- Extend bool casting to accept "1" and "yes" as truthy values
- Move input casting inside _execute_job's try/except so TypeError
  from invalid inputs is reported to the backend as a failed job
  instead of propagating uncaught

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(runner): use fake timers in typed-params TS test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…lude workspace (#6022)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Migration 000061_add_catch_up_columns_to_retention_rules was already
executed in staging/prod. It was mistakenly renamed to 000062 and its
index was changed in-place, which would break Liquibase checksums.

Restore 000061 to its original form and create a new 000062 migration
that drops the old index and recreates it with the corrected column
order (catch_up_done, enabled, apply_to_past, catch_up_cursor).
…ed runners (#6024)

Online scoring E2E tests were consistently timing out (60s) on
GitHub-hosted runners in post-merge CI, while passing on self-hosted
runners. The scoring pipeline (rule activation -> trace creation ->
LLM API call -> score storage) takes longer on resource-constrained
GitHub-hosted runners.

- Increase test timeout from 60s to 120s
- Increase polling attempts from 15 to 25
- Increase page refresh wait from 2s to 3s

Co-authored-by: Andrei Căutișanu <andreicautisanu@ip-192-168-1-128.eu-west-1.compute.internal>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, and UX improvements (#6011)

* [NA] [TS SDK] feat: indexed keys in LLMJudge schema, reasoning_effort, and UX improvements

Port Python SDK PRs #5690 and #5677 to TypeScript SDK:

- Use indexed keys (assertion_1, assertion_2) instead of assertion text as
  JSON schema property names for cross-provider compatibility (Anthropic,
  OpenAI character limits)
- Refactor buildResponseSchema/parseResponse into ResponseSchema class
- Add reasoningEffort option to LLMJudge (defaults to "low")
- Add ---BEGIN/END--- delimiters around input/output in LLM judge prompt
- Move dashboard link inside result box, remove "Uploading results" message

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ts-sdk): fromConfig reads description over name for cross-source compatibility

Ensures UI-created LLM judge configs (where name="Correctness" but
description="Whether the output is correct") deserialize correctly.
Also fixes variables format to match Python SDK / backend convention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ts-sdk): always show experiment link, even without metrics

The result box with the dashboard link was only displayed when metrics
were present. Moved getUrl() to processResults so the link is always
shown, fixing the evaluate.test.ts regression.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ts-sdk): treat experiment dashboard link as best-effort

Wrap experiment.getUrl() in try/catch so a missing dataset doesn't
crash the evaluation results flow. The dashboard link is skipped
gracefully if the URL cannot be resolved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ts-sdk): mock createExperimentItems in evaluateWithVersion test

The test was missing a mock for Experiment.insert's underlying API call,
causing unhandled 401 rejections in CI after test teardown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ts-sdk): address PR review — cache ResponseSchema, pass reasoningEffort, remove dead code

- Cache ResponseSchema as instance field instead of recreating on every
  score()/toConfig() call
- Pass reasoning_effort to generateProviderResponse so the LLM actually
  receives it at runtime
- Remove unused assertions field from ResponseSchema class

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ng (#6021)

* [OPIK-5452] [SDK] feat: warn when runner process exits without blocking

When a user's script exits without a blocking call (e.g., no server framework),
the runner never processes jobs. This adds detection via signal tracking — if the
process exits cleanly (no SIGTERM/SIGINT), a warning is printed advising the user
to use a server framework like uvicorn/Flask (Python) or express/fastify (TypeScript).

Implements OPIK-5452

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(runner): address PR review comments

- Consolidate test imports to use only module-level import
- Make install_signal_handlers return bool; skip atexit when handlers fail

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_run() to pass project_name to the OpikClient (#5956)

* [OPIK-5297] Add `project_name` to optimization creation in `base_optimizer.py`

* [OPIK-5297] Add assertion for `project_name` propagation in optimization creation test

* [OPIK-5297] Add `source="optimization"` to span-related test cases

* [OPIK-5297] Add `project_name` support to optimizer dataset creation and associated tests

* [OPIK-5297] Refactor test fixtures to improve `project_name` propagation and environment setup

* [OPIK-5297] Refactor `setup_environment` fixture to use `pytest.MonkeyPatch.context` for improved environment variable management

* [OPIK-5297] Set `setup_environment` fixture scope to `session` in E2E tests

* [OPIK-5297] Set `setup_environment` fixture to `autouse` and fix import paths in E2E tests

* [OPIK-5297] Update `setup_driving_hazard_dataset` fixture to use `Generator` for improved type safety

* [OPIK-5297] Introduce old dataset cleanup and improve `setup_environment` fixture in E2E tests

* [OPIK-5297] Added more detailed logging

* [OPIK-5297] Fixed failing unit test

* [OPIK-5297] Refactor dataset creation to use cached client and centralize dataset cleanup logic

* [OPIK-5297] Improve logging, typing, and validation across dataset utilities and test fixtures

* [OPIK-5297] Removed unnecessary client.end() call

* [OPIK-5297] Add Opik server setup and health checks to E2E test workflow

* [OPIK-5297] Add verbose summary report to pytest output in E2E workflow

* [OPIK-5297] Fixed to avoid race conditions during E2E test execution

* [OPIK-5297] Extend E2E workflow timeout to 40 minutes
…#6009)

* [OPIK-5427] [FE] Prettify agent sandbox output with SyntaxHighlighter

Replace raw <pre> dump with existing SyntaxHighlighter component,
enabling Pretty (markdown), JSON, and YAML view modes with copy support.

* Revision 2: Handle null agent results in SyntaxHighlighter

Use undefined as the no-data sentinel so completed jobs with null
results still render the SyntaxHighlighter (showing output: null)
instead of silently hiding the result section.
#6001)

* Status filter for local runners

* address comments

* address comments
Co-authored-by: aadereiko <aliaksandr@comet.com>
* [NA] [BE] fix: filter spans by trace_id in project metrics queries

Span subqueries in GET_COST, GET_COST_WITH_BREAKDOWN, GET_TOKEN_USAGE,
and GET_TOKEN_USAGE_WITH_BREAKDOWN were not scoping spans to the traces
returned by traces_filtered. Adding AND trace_id IN (SELECT id FROM
traces_filtered) ensures spans are only aggregated for traces that pass
all applied filters (time range, name, metadata, feedback scores, etc.).

Benchmarked on production (1.9M spans, 7-day window):
- Granules read: 25,429 → 4,959 (5x reduction)
- GET_TOKEN_USAGE latency: ~2.0s → ~0.6s median (3.5x faster)
- GET_COST latency: ~1.6s → ~0.9s median (1.7x faster)

* [NA] [BE] fix: scope span subqueries to traces_filtered and add created_at index on authored_feedback_scores

- Add AND trace_id IN (SELECT id FROM traces_filtered) to span subqueries in
  GET_COST, GET_COST_WITH_BREAKDOWN, GET_TOKEN_USAGE, GET_TOKEN_USAGE_WITH_BREAKDOWN.
  Previously filtering by span.id (5th ORDER BY column) caused full-table scans;
  the fix reduces granules read from 25,429 to 4,959 (~5x) and query latency by ~2x.
- Replace inline dateDiff duration expressions with the MATERIALIZED duration column
  in TRACE_FILTERED_PREFIX, SPAN_FILTERED_PREFIX, and GET_AVERAGE_DURATION.
- Remove FINAL from feedback_scores and authored_feedback_scores reads in
  TRACE_FILTERED_PREFIX, SPAN_FILTERED_PREFIX, and THREAD_FILTERED_PREFIX,
  replacing deduplication with ROW_NUMBER() window function which is already applied.
- Scope traces_final in THREAD_FILTERED_PREFIX to only traces whose thread_id is
  in the selected time window (was previously loading all threads in the project).
- Add minmax skip index on authored_feedback_scores.created_at (migration 000076).

* Update and rename 000076_add_minmax_index_authored_feedback_scores_created_at.sql to 000078_add_minmax_index_authored_feedback_scores_created_at.sql
)

* [OPIK-5479] [FE] fix: clear cached pair code on runner connection

When a runner connects, the backend immediately consumes the pairing
code (Redis getAndDelete). However the frontend kept the stale code
in React Query cache. If the runner later disconnected, the empty
state re-displayed the expired code, causing users to attempt
reconnection with a code that would always fail.

Fix: use queryClient.removeQueries() to evict the pair code cache
as soon as isConnected becomes true. On subsequent disconnection
React Query sees no cached data and fetches a fresh code on demand.

* [OPIK-5479] [FE] test: add unit tests for pair code cache invalidation

Tests verify that:
- Empty state shows pair code when disconnected
- Connected state renders when runner is connected
- Pair code cache is cleared (removeQueries) on connection
- Pair code cache is NOT cleared when disconnected

* [OPIK-5479] [FE] fix: lint errors in test file (display names)

* [OPIK-5479] [FE] fix: typecheck error — use vi.spyOn return type
…ing usage

Extract cache_creation_input_tokens and cache_read_input_tokens from the
message_start chunk in ClaudeAggregator.aggregate() and pass them to
anthropic_to_bedrock_usage(), so cacheWriteInputTokens and
cacheReadInputTokens are correctly tracked in streaming responses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added python Pull requests that update Python code tests Including test files, or tests related like configuration. Python SDK labels Apr 1, 2026
@github-actions

github-actions Bot commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Python SDK Unit Tests Results (Python 3.11)

1 tests   0 ✅  7s ⏱️
1 suites  0 💤
1 files    0 ❌  1 🔥

For more details on these errors, see this check.

Results for commit a4953bc.

@github-actions

github-actions Bot commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Python SDK Unit Tests Results (Python 3.12)

1 tests   0 ✅  7s ⏱️
1 suites  0 💤
1 files    0 ❌  1 🔥

For more details on these errors, see this check.

Results for commit a4953bc.

@github-actions

github-actions Bot commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Python SDK Unit Tests Results (Python 3.14)

1 tests   0 ✅  4s ⏱️
1 suites  0 💤
1 files    0 ❌  1 🔥

For more details on these errors, see this check.

Results for commit a4953bc.

@github-actions

github-actions Bot commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Python SDK Unit Tests Results (Python 3.13)

1 tests   0 ✅  8s ⏱️
1 suites  0 💤
1 files    0 ❌  1 🔥

For more details on these errors, see this check.

Results for commit a4953bc.

@github-actions

github-actions Bot commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Python SDK Unit Tests Results (Python 3.10)

1 tests   0 ✅  8s ⏱️
1 suites  0 💤
1 files    0 ❌  1 🔥

For more details on these errors, see this check.

Results for commit a4953bc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Made by Ollie 🦉 Python SDK python Pull requests that update Python code tests Including test files, or tests related like configuration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: bedrock claude adapter doesn't save cache tokens usage